More Speed and More Compression: Accelerating Pattern Matching by Text Compression

نویسندگان

  • Tetsuya Matsumoto
  • Kazuhito Hagio
  • Masayuki Takeda
چکیده

This paper addresses the problem of speeding up string matching by text compression, and presents a compressed pattern matching (CPM) algorithm which finds a pattern within a text given as a collage system 〈D,S〉 such that variable sequence S is encoded by byte-oriented Huffman coding. The compression ratio is high compared with existing CPM algorithms addressing the problem, and the search time reduction ratio compared to the Knuth-Morris-Pratt algorithm over uncompressed text is nearly the same as the compression ratio.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pattern Matching Machine for Text Compressed Using Finite State Model

The classical pattern matching problem is to nd all occurrences of patterns in a text. In many practical cases, since the text is very large and stored in the secondary storage, most of the time for the pattern matching is dominated by data transmission of the text. Therefore the text compression can speed-up the pattern matching. In this framework it is required to develop an e cient pattern m...

متن کامل

Correction to "lossless, near-lossless, and refinement coding of bilevel images"

We present general and unified algorithms for lossy/lossless coding of bilevel images. The compression is realized by applying arithmetic coding to conditional probabilities. As in the current JBIG standard the conditioning may be specified by a template. For better compression, the more general free tree may be used. Loss may be introduced in a preprocess on the encoding side to increase compr...

متن کامل

Speed-up of Aho-Corasick Pattern Matching Machines by Rearranging States

This paper describes speed-up of string pattern matching by rearranging states in Aho-Corasick pattern matching machine, which is a kind of afinite automaton. We realized speed-up of string pattern matching using data compression. Although we obtain higher compression ratio using a finite state model, it doesn’t lead speed-up of string pattern matching. Because the pattern matching machine beco...

متن کامل

Byte pair encoding : a text compression scheme that accelerates pattern matching

Byte pair encoding (BPE) is a simple universal text compression scheme. Decompression is very fast and requires small work space. Moreover, it is easy to decompress an arbitrary part of the original text. However, it has not been so popular since the compression is rather slow and the compression ratio is not as good as other methods such as Lempel-Ziv type compression. In this paper, we bring ...

متن کامل

Speeding Up Pattern Matching by Text Compression

Byte pair encoding (BPE) is a simple universal text compression scheme. Decompression is very fast and requires small work space. Moreover, it is easy to decompress an arbitrary part of the original text. However, it has not been so popular since the compression is rather slow and the compression ratio is not as good as other methods such as Lempel-Ziv type compression. In this paper, we bring ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007